Hummer: Mitigating Stragglers with Partial Clones

نویسندگان

  • Jia LI
  • Changjian WANG
  • Dongsheng LI
  • Yiming ZHANG
چکیده

Small jobs typically run for interactive data analyses in datacenters, and often are delayed by long-running tasks called stragglers. Many efforts, like Blacklist, speculative execution, proactive mitigation, have been devoted to the solutions. However, they either consume too much time or waste too many resources. In this paper, we propose a new proactive method to mitigate stragglers by performing partial clones, which improves job average duration by 48% and 18% compared to LATE and Dolly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gradient Coding: Avoiding Stragglers in Distributed Learning

We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and ge...

متن کامل

Gradient Coding

We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent. We implement our scheme in MPI and show how we compare against baseline architectures in running time and generalization error.

متن کامل

Fine-Grained Micro-Tasks for MapReduce Skew-Handling

Recent work on MapReduce has considered the problems of skew, where a job’s tasks exhibit large variance in size and processing cost, and stragglers, tasks that run slowly due to conditions on particular nodes. In this paper, we discuss an extremely simple approach to mitigating skew and stragglers: break the workload into many small tasks that are dynamically scheduled at runtime. This approac...

متن کامل

Effective Straggler Mitigation: Attack of the Clones

Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques, these latency sensitive jobs have stragglers that are on average 8 times slower than the median task in ...

متن کامل

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial gradients based on their partial and local data sets, and send the results to a master node where all the computations are aggregated into a f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015